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ABSTRACT 


The  sequencing  of  the  human  genome  has  revealed  that  only  2%  of  the  genome  actually  codes  for  protein. 
However,  the  remainder  of  the  genome  is  not  “junk”  and  it  has  recently  been  revealed  that  most  of  the  genome 
is  transcriptionally  active.  We  utilized  a  tiling  array  approach  to  examine  the  entire  genome  for  transcriptional 
activity  and  found  a  large  number  of  non-coding  transcripts.  When  we  originally  proposed  the  work  in  this 
Concept  Award,  we  proposed  to  study  a  group  of  long,  highly  conserved,  non-coding  transcripts  which  had 
altered  expression  and  sometimes  mutations  in  breast  cancer.  We  have  subsequently  found  that  many  of  these 
highly  conserved  transcripts  are  either  part  of  known  genes  or  highly  homologous  to  known  genes.  However, 
we’ve  now  identified  two  new  groups  of  novel  non-coding  transcripts  which  are  not  part  of  genes.  The  first 
group  is  non-coding  transcripts  that  have  increased  expression  in  response  to  the  DNA  damage  induced  by  the 
carcinogen  NNK.  These  NNK- induced  transcripts  (NiTs)  are  all  over  300  nucleotides  long  and  have  altered 
expression  in  breast  cancer.  We  have  validated  these  transcripts  and  have  utilized  Northern  blots  to  determine 
the  precise  size  of  these  transcripts  (and  they  are  between  500  and  1,500  base  pairs  in  length).  The  second  group 
we  identified  by  analyzing  breast  cancer  cell  lines  with  tiling  arrays  searching  for  non-coding  transcripts  that 
had  consistently  altered  expression.  We’ve  now  identified  a  group  of  breast  cancer  non-coding  transcripts 
(bcNCTs).  These  have  been  validated  in  a  larger  panel  of  breast  cancer  cell  lines.  In  this  final  report  for  this 
Concept  Award,  we  describe  our  selection  of  one  of  the  NiTs  (NiT  5)  and  one  of  the  bcNCTs  (bcNCT  28),  as 
well  as  our  preliminary  characterization  of  these  two  long  non-coding  transcripts.  We  also  describe  our  work  to 
determine  the  role  that  these  long  non-coding  transcripts  play  in  the  development  of  breast  cancer. 
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INTRODUCTION 


Only  2%  of  the  human  genome  actually  codes  for  protein.  In  spite  of  this,  it  turns  out  that  most  of  the  genome  is 
transcriptionally  active.  The  question  we  set  out  to  answer  is  what  the  function  of  these  non-coding  transcripts 
are  and  what  role,  if  any,  do  they  play  in  the  development  of  breast  cancer.  Our  major  hypothesis  is  that  these 
non-coding  transcripts  play  an  important  regulatory  role  within  the  cells  and  would  be  important  targets  of 
alteration  during  the  development  of  breast  cancer. 

We  decided  to  focus  our  efforts  on  a  group  of  long  (greater  than  300  nts)  non-coding  transcripts  that  we 
identified  by  using  a  tiling  array  approach  to  identify  transcriptional  units  across  the  genome.  We  specifically 
focused  on  the  most  highly  conserved  of  these  non-coding  transcripts  and  had  preliminary  evidence  that  several 
of  these  were  also  targets  of  alteration  (both  in  terms  of  expression  and  as  occasional  mutational  targets)  in 
breast  cancer.  We  proposed  to  characterize  two  of  these  (NiT4  and  NiT5)  as  they  were  the  most  highly 
conserved. 

However,  subsequent  analysis  of  the  highly  conserved  non-coding  transcripts  has  revealed  that  most  of  the  more 
highly  conserved  transcripts  actually  had  homology  to  existing  coding  transcripts.  Indeed  results  of  the 
ENCODE  project  now  reveal  that  each  coding  gene  produces  on  the  average  about  5  distinct  transcripts  and  that 
there  is  much  greater  complexity  to  gene  organization  than  previously  anticipated.  In  addition,  the  simple  model 
of  genes  being  merely  a  collection  of  contiguous  exons  that  are  spliced  together  may  also  be  wrong.  Many 
transcripts  generated  are  actually  produced  from  quite  disparate  chromosomal  regions  (this  was  revealed  most 
convincingly  by  doing  mate-pair  sequence  analysis  of  large  numbers  of  transcripts  using  Next  Generation  DNA 
sequencing  technology).  In  addition,  there  are  cryptic  exons  that  are  sometimes  hundreds  of  kilobases  upstream 
or  downstream  from  the  simple  organized  gene.  Indeed,  many  of  the  highly  conserved  non-coding  transcripts 
that  we  were  characterizing  were  found  to  be  linked  to  known  genes  either  due  to  extensive  homology  to  those 
known  genes  or  because  they  actually  corresponded  to  those  distant  exons.  However,  the  group  of  Dr.  Carlo 
Croce  recently  described  the  identification  of  a  number  of  highly  conserved  long  non-coding  transcripts  which 
appear  to  be  bona-fide  non-coding  transcripts  that  are  altered  in  human  leukemias  and  carcinomas  (Calin  GA  et 
al:  Ultraconserved  regions  encoding  ncRNAs  are  altered  in  human  leukemias  and  carcinomas.  Cancer  Cell 
2007;  12:  215-229).Fortunately,  we  had  two  additional  sources  of  non-coding  transcripts  that  we  began  to 
analyze  in  greater  detail. 

Research  Accomplishment  1:  Identify  NNK-induced  long  non-coding  transcripts  (NiTs) 

We  performed  our  initial  tiling  array  experiment  to  identify  possible  non-coding  transcripts,  we  examined  a 
normal  human  bronchial  epithelial  cell  line  exposed  to  the  DNA  damaging  tobacco  carcinogen  NNK  (4- 
(methylnitrosamino)- 1  -(3-pyridyl)- 1  -butanone). 

We  analyzed  the  entire  genome  in  response  to  NNK  stress  looking  for  long  potential  non-coding  transcripts 
(longer  than  300  base  pairs)  that  were  either  induced  or  repressed  by  this  stress.  This  analysis  identified 
transcripts  that  were  induced  by  exposure  to  NNK  (called  NiTs  for  NNK-induced  non-coding  transcripts)  and 
suppressed  by  exposure  to  NNK  (called  NsTs  for  NNK-suppressed  non-coding  transcripts;  Figures  2  and  3, 
respectfully). 

We  continued  to  focus  our  attention  on  longer  non-coding  transcripts  as  we  attempted  to  look  for  novel 
transcripts  that  were  not  either  miRNAs  or  potential  miRNA  precursors.  It  is  interesting  and  important  to  note 
that  two  previously  reported  non-coding  transcripts,  tncRNA  and  MALAT-1  which  are  located  adjacent  to  each 
other  on  chromosome  11,  were  identified  in  this  screen  as  being  induced  by  exposure  to  NNK  and  thus  qualify 
as  NiTs.  Figure  1  below  shows  the  integrated  genome  browser  (IGB)  results  with  tncRNA  and  MALAT-1,  as 
well  as  several  of  the  newly  identified  stress-responsive  non-coding  transcripts. 
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Figure  1:  NNK  induction  of  long  non-coding  RNAs  tncRNA  and  MALAT 
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Figure  2:  NNK-induced  non-coding  transcript  (NCT) 
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Figure  3:NNK-suppressed  NCT 
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We  next  began  to  more  fully  characterize  the  NiTs;  we  have  not  yet  begun  to  work  on  the  NSTs.  We  identified 
over  231,  305  NNK-Transcriptionally  active  regions  (TARs)  within  the  entire  human  genome  with  their  lengths 
ranging  from  100  nt  to  greater  than  3000  nt.  Of  these  TARS  119,305  were  induced  by  NNK,  while  112,000 
were  decreased  by  NNK  treatment.  Currently,  we  have  analyzed  the  NNK-induced  TARS  further  by  identifying 
the  longer  non-coding  transcripts  and  by  excluding  transcripts  smaller  than  300  nts,  thereby  identifying  1,305 
long  NNK-induced  TARs  (Figure  4).  We  then  used  stringent  criteria  to  identify  true  long  NNK-induced  non¬ 
coding  transcripts.  This  criteria  includes  non-coding  potential  determined  by  verifying  a  lack  of  significant  open 
reading  frames,  the  NiTs  also  had  no  homology  to  any  RefSeq  Genes  or  any  potential  ncRNAs  (C/D  and 
H/ACA,  snoRNAs  and  microRNAs),  and  contain  a  lack  of  homology  to  any  known  mRNAs.  In  addition,  the 
criteria  demanded  that  NiTs  not  have  duplications  or  repeats.  The  intense  final  analysis  of  the  NNK-TARs 
allowed  us  to  identify  12  NiTs  (Figure  5)  excluding  tncRNA  and  MALAT1. 
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Figure  4:  Identification  of  NiTs  (NNK-induced  non-coding 
Transcripts) 


We  also  generated  probes  for  these  stress-responsive  non-coding 
transcripts  and  hybridized  them  to  Northern  Blots  to  determine 
the  size  of  the  putative  non-coding  transcripts.  Figure  shows 
several  representative  NiTs  (NiT2  and  NiT6)  Northern  blots. 
This  analysis  revealed  that  the  full  size  of  these  transcripts  was 
greater  than  that  determined  on  the  tiling  arrays. 
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Figure  5:  12  NiTs  location  and  increased  transcripts  shown  by  Integrated  genome  browser 
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Research  Accomplishment  2:  Validate  NiT  expression  levels  using  quantitative  real-time  PCR  (qPCR) 
and  Northern  blot  analysis 

Specific  primers  were  determined  for  optimal  concentrations  for  quantitative  real-time  PCR  (qPCR).  Once  this 
was  accomplished  we  began  to  analyze  panels  of  random-primer  cDNAs  made  from  different  RNA  samples. 
This  included  a  panel  of  RNAs  isolated  from  various  normal  human  tissues  (Figure  6),  a  panel  of  breast  normal 
and  cancer-derived  cell  lines  (Figure  7  and  Table  1),  as  well  as  normal  human  bronchial  epithelial  (NHBE)  and 
MCF7  cell  lines  exposed  to  NNK  (Figure  8).  The  goal  of  this  was  to  determine  whether  these  transcripts  were 
indeed  stress  responsive  as  we  had  observed  in  the  tiling  array  experiment  and  validate  those  results,  examine 
the  spectrum  of  their  expression  in  different  tissues,  and  finally  whether  or  not  they  have  altered  expression  in 
both  breast  cancer  cell  lines  and  primary  tumors. 
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Figure  7:  NiTs  expression  in  breast  cancer  panel 
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Table  1:  NiT  expression  indicated  by  average  delta  CT  relative  to  actin  control  transcript  with  standard  deviation  in  a 


breast  cancer  cell  line  panel.  Significance  listed  with  *  indicated  by  2-fold  difference  compared  to  normal  HMEC 
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Figure  8:  NiTs  expression  in  response  to  NNK  treatment 
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Figure  9:  NiT2  and  NiT6  RNA  transcript  in  NHBE  and  breast  cell  lines 


Of  the  12  NiTs  that  have  been  characterizing,  we  decided  to  focus  all  of  our  efforts  on  one  particular  NiT,  NiT 
5.  This  NiT  had  increased  expression  in  the  majority  of  the  breast  cancer  cell  lines,  as  well  as  primary  tumors, 
that  we  been  characterized.  NiT5  was  identified  on  chromosome  5:2  768  026  -  2  768  369,  it  is  intergenically 
located  and  is  several  megabase  to  kilobases  away  from  any  nearby  genes.  NiT5  has  high  expression  in  normal 
proliferative  tissues  such  as  colon  and  spleen  and  contains  low  expression  in  slower  proliferative  normal  tissues 
such  as  brain  (Figure  10).  NiT5  also  has  higher  expression  in  panel  of  breast,  ovarian,  and  cervical  cancer  cell 
lines,  however  has  lower  expression  in  an  endometrial  primary  tissue  panel  (Figures  1 1  and  12). 
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Figure  11:  NiT5  has  high  expression  in  breast,  ovarian,  and  cervical  cancer  cell  line  panels 
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Figure  12:  NiT5  has  low  expression  in  endometrial  primary  tissue  panel 
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Research  Accomplishment  3:  Attempts  to  knock  down  expression  of  NiT5  transcript  using  siRNA 
technology. 

The  last  part  of  our  proposed  work  on  NiT  5  was  to  determine  what  effect  would  be  observed  in  cells  after  we 
had  knocked  down  the  expression  of  NiT  5  (using  siRNA)  in  cell  lines  that  had  increased  expression  of  NiT  5. 
We  chose  two  cell  lines  BT474  and  OvCar  5  (the  first  is  a  breast  cancer  cell  line  and  the  second  is  an  ovarian 
cancer  cell  line).  We  constructed  several  siRNAs  to  attempt  to  knock  down  the  expression  of  NiT  5  but  none  of 
the  siRNA  constructs  tested  produced  a  significant  decrease  in  NiT  5  expression.  We  tried  several  different 
approaches  to  successfully  transfect  functional  siRNAs  into  the  cell  lines  and  then  realized  that  perhaps  the 
problem  was  that  we  were  dealing  with  a  non-coding  transcript  which  is  localized  in  the  nucleus  and  all  our 
various  approaches  to  get  the  siRNAs  into  the  cells  would  localize  them  into  the  cytoplasm.  We  then  searched 
for  protocols  that  would  transfect  nuclear  encoded  RNAs  and  eventually  found  one  that  does  result  in  decreased 
NiT  5  expression.  Figure  13  summarizes  the  level  of  knock-down  of  NiT  5  in  BT474  and  demonstrates  that  we 
can  indeed  decrease  the  nuclear  expression  of  NiT  5.  Unfortunately,  these  results  were  just  obtained  several 
weeks  ago  (actually  after  the  termination  date  on  this  grant).  As  a  result  of  this  we  do  not  yet  have  any  results 
on  the  phenotypic  effect  of  this. 


Figure  13:  siRNA  knockdown  of  NiT5  expression  in  BT474 


NiT5  expression 


In  the  progress  report  from  the  last  year  we  discussed  performing  knock-downs  of  both  NiT  5  and  NiT  4,  but 
because  of  the  difficulty  that  we  had  in  knocking  down  the  expression  of  nuclear-localized  non-coding 
transcripts,  we  have  not  yet  begun  to  work  on  decreasing  the  expression  of  NiT  4. 

Research  Accomplishment  4:  Identification  of  long  non-coding  transcripts  altered  in  breast  cancer 
(bcNCTs) 

The  second  source  of  potential  non-coding  transcripts  we  are  studying  were  derived  from  an  experiment  that  we 
conducted  to  directly  search  for  non-coding  transcripts  that  were  altered  in  breast  cancer  cell  lines.  We  took  the 
normal  breast  epithelial  cell  line  HMEC  and  five  breast  cancer  cell  lines  including  HCC1500,  HCC1569, 
MDA157,  T47D,  and  MDA435.  Total  RNA  was  isolated  from  each  of  these  and  then  hybridized  to  a  single 
tiling  array  chip  in  triplicate.  While  our  initial  studies  were  completed  with  14  tiling  array  chips  covering  the 
entire  genome,  we  decided  to  conduct  a  more  focused  and  considerably  less  expensive  experiment  of  only 
examining  a  single  tiling  array  chip  containing  probes  covering  three  human  chromosomes  (chromosomes  8,  1 1, 
and  12).  We  then  searched  for  non-coding  transcripts  with  altered  expression  in  more  than  one  breast  cancer  cell 
line,  relative  to  HMEC.  While  there  were  many  transcripts  altered  in  just  one  of  the  cell  lines,  common 
alterations  in  multiple  cells  lines  were  few.  There  were  24  transcripts  altered  in  two  cell  lines,  and  1  transcript 
altered  in  three  cell  lines.  No  transcripts  were  found  to  be  altered  in  all  four  cell  lines  and  these  transcripts  had 
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no  correlation  with  NNK  responsive  NCTs.  The  transcripts  that  had  these  consistent  alterations  were  called 
bcNCT  (for  breast  cancer  Non-Coding  Transcripts). 

Research  Accomplishment  5:  Validate  expression  levels  of  bcNCT  transcripts  using  quantitative  real¬ 
time  PCR  (qPCR) 

The  24  bcNCTs  altered  in  the  two  cell  lines  were  then  characterized  exactly  as  the  NiTs  were  characterized. 
Real-time  primers  were  designed  and  optomized  to  verify  whether  or  not  the  bcNCT  expression  was  indeed 
consistently  altered  in  a  much  larger  panel  of  breast  cancer  cell  lines,  as  well  as  primary  tumors.  We  also 
characterized  the  bcNCTs  to  determine  expression  in  normal  human  tissues.  The  bcNCTs  were  also 
characterized  to  identify  if  they  were  stress  responsive,  as  this  was  our  original  goal  with  the  stress  response 
tiling  array  experiment.  It  is  important  to  note  there  that  the  two  characterized  non-coding  transcripts,  tncRNA 
and  MALAT-1,  which  have  increased  expression  in  response  to  DNA  damage  induced  by  NNK  also  have 
increased  expression  in  several  of  the  breast  cancer  cell  lines.  This  provides  us  good  support  for  our  hypothesis 
that  stress-responsive  non-coding  transcripts  are  good  candidates  for  transcripts  that  also  have  altered 
expression  during  the  development  of  breast  cancer.  The  Figures  below  show  several  of  the  bcNCTs  that  we 
have  chosen  for  further  study. 


Figure  14:  Expression  Patterns  of  bcNCT  transcripts 
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bcNCTs  Response  to  NNK  Treatment 
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bcNCTs 

We  also  previously  discussed  our  plans  to  decrease  the  expression  of  bcNCT  28,  but  because  of  the  difficulty 
that  we  had  in  knocking  down  NiT  5  expression,  we  did  not  have  time  to  get  to  examine  what  effect  alterations 
in  bcNCT  28  would  have  on  different  cell  lines. 


A  second  experiment  that  we  originally  proposed,  but  did  not  have  sufficient  time  to  conduct,  was  to  test  the 
effect  of  increasing  the  expression  of  NiT  4,  NiT  5,  and  bcNCT  28  in  cell  lines  derived  from  normal  bronchial 
epithelial  cells  (NHBE).  Our  hypothesis  was  that  this  would  result  in,  at  a  minimum,  increased  cellular  growth 
and  potentially  more  profound  phenotypic  changes.  These  experiments  are  currently  underway  for  the  NiT5 
construct.  Currently,  over-expression  constructs  are  being  created  by  cloning  full  length  NiT5  into  a  TOPO  dual 
promoter  SP6/T7  and  creating  plasmid  constructs  for  transfection  into  NHBE  cell  lines.  These  constructs  will  be 
transfected  using  RNAifect  reagents  and  the  resulting  transformants  will  be  used  to  conduct  functional  studies 
of  migration  and  proliferation. 

It  should  be  mentioned  that  thanks  to  the  Department  of  Defense  Breast  Cancer  Research  Program  we  have  just 
received  a  Breast  Cancer  Idea  award  to  continue  our  studies  on  NiT  5.  We  are  currently  performing  these 
experiments  and  will  then  be  able  to  determine  what  effect  this  has  on  both  cancer-derived  cell  lines  with 
increased  expression  of  these  non-coding  transcripts  as  well  as  increasing  non-coding  transcript  expression  in 
normal  tissue-derived  cell  lines. 


Another  set  of  experiments  that  we  proposed  to  conduct,  but  have  not  yet  completed  because  of  our  difficulty  in 
the  knock-down  experiments,  was  to  utilize  the  powerful  technology  of  Next  Generation  RNA  sequencing  to 
examine  the  genes  that  have  altered  expression  after  either  increasing  or  decreasing  the  expression  of  NiT  5. 
However,  these  experiments  will  be  conducted  as  part  of  our  Breast  Cancer  Idea  award.  We  are  currently 
concluding  a  set  of  successful  experiments  to  optimize  the  siRNA  transfection  system  for  knocking  down 
expression  of  NiT5  transcripts  in  BT474  cells  in  culture.  We  will  begin  the  sequencing  of  RNA  isolated  from 
BT474  cells  and  BT474  cells  with  siRNA  reduced  expression  of  NiT5  using  whole  transcriptome  sequencing 
technology  on  the  Illumina  platform.  Pathway  analysis  will  be  performed  to  identify  transcripts,  genes  and 
pathways  that  are  differentially  regulated  following  silencing  of  NiT5. 


We  would  like  to  thank  the  Department  of  Defense  Breast  Cancer  Program  for  supporting  our  work.  While  it  is 
still  in  the  preliminary  stages,  we  remain  hopeful  that  this  work  will  identify  important  new  targets  of  alteration 
during  the  development  of  breast  cancer.  We  have  submitted  an  R01  proposal  to  the  NIH  on  some  of  the  non- 
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coding  transcripts,  but  as  expected,  the  first  submission  was  triaged  out.  However,  with  additional  supportive 
results  (which  we  hope  to  obtain  with  Department  of  Defense  support),  this  grant  should  be  more  competitive. 


KEY  RESEARCH  ACCOMPLISHMENTS 

(1)  Identified  NNK-induced  long  non-coding  transcripts  (NITs). 

(2)  Validated  these  NIT  transcripts  both  with  real-time  RT-PCR  and  with  Northern  blot  analysis. 

(3)  Attempted  to  knock  down  the  expression  of  NiT  5  using  siRNA  technology. 

(4)  Identified  breast  cancer  altered  long  non-coding  transcripts  (bcNCTs). 

(5)  Validated  these  bcNCT  transcripts  both  with  real-time  RT-PCR  and  with  Northern  blots. 
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